Determining tumor origin

ABSTRACT

The disclosure provides methods for the use of gene expression measurements to classify or identify among 54 cancer types in samples obtained from a subject in a clinical setting, such as in cases of formalin fixed, paraffin embedded (FFPE) samples.

RELATED APPLICATIONS

This application is related to U.S. patent application Ser. Nos.11/145,307, filed Jun. 3, 2005, which claims priority to U.S.Provisional Patent Application 60/577,084, filed Jun. 4, 2004, and toapplication Ser. Nos. 11/422,056, filed Jun. 2, 2006, which claimspriority to U.S. Provisional Patent Application 60/687,174, filed Jun.3, 2005. All four applications are hereby incorporated in theirentireties as if fully set forth.

FIELD OF THE DISCLOSURE

This disclosure relates to the use of gene expression to classify humantumors. The classification is performed by use of gene expressionprofiles, or patterns, of 50 or more, or optionally 5 or more, expressedsequences, where the sequences are expressed in more than one tumortype. The disclosure thus includes use of gene expression levels thatoverlap in more than one tumor type or tumors that arise from certaintissues. The disclosure also provides for the use of 50 or more, oroptionally 5 or more, specific gene sequences, the expression of whichare present in more than one tissue source of a tumor and so tumor orcancer type. The gene expression profiles, whether embodied in nucleicacid expression, protein expression, or other expression formats, may beused to determine a cell containing sample as containing tumor cells ofa tissue type or from a tissue origin to permit a more accurateidentification of the cancer and thus treatment thereof as well as theprognosis of the subject from whom the sample was obtained.

BRIEF SUMMARY OF THE DISCLOSURE

This disclosure relates to the use of gene expression measurements toclassify or identify cancers and/or tumors in cell containing samplesobtained from a subject in a clinical setting, such as in cases offormalin fixed, paraffin embedded (FFPE) samples as well as freshsamples, that have undergone none to little or minimal treatment (suchas simply storage at a reduced, non-freezing, temperature), and frozensamples. The disclosure thus provides the ability to classify a sampleunder real-world conditions faced by hospital and other laboratorieswhich conduct testing on clinical FFPE samples. The samples may be of aprimary tumor sample or of a tumor that has resulted from a metastasisof another tumor. Alternatively, the sample may be a cytological sample,such as, but not limited to, cells in a blood sample. The disclosure mayalso be viewed as molecular profiling of an unknown cancer or tumor bypredicting tissue of origin for the cancer or tumor.

In some cases of a tumor sample, the tumors may not have undergoneclassification by traditional pathology techniques, may have beeninitially classified but confirmation is desired, or have beenclassified as a “carcinoma of unknown primary” (CUP) or “tumor ofunknown origin” (TUO) or “unknown primary tumor”. The need forconfirmation is particularly relevant in light of the estimates of 5 to10% misclassification using standard techniques. Thus the disclosure maybe viewed as providing means for cancer identification, or CID, of atumor or tumor sample as being one of a plurality of possible tumortypes. The range of possible tumor types is disclosed herein, andincludes tumor types that were not previously assignable to a unknowncancer or tumor type.

In a first aspect of the disclosure, the classification is performed byuse of gene expression profiles, or patterns, of 5 or more, oroptionally 50 or more, expressed sequences. The gene expressionprofiles, whether embodied in nucleic acid expression, proteinexpression, or other markers of gene expression, may be used todetermine a cell containing sample as containing tumor cells of a tissuetype or from a tissue origin to permit a more accurate identification ofthe cancer and thus treatment thereof as well as the prognosis of thesubject from whom the sample was obtained.

The expression products of the expressed sequences may be found inmultiple tumor types within a plurality, or group, of known possibletumor types as disclosed herein. The expression levels of the sequencesmay thus occur in more than one tumor type in the group. Additionally,the range of expression levels may overlap between known tumor types inthe group. The disclosed methodology of classifying or identifying tumortypes may also be applied to the classification or identification oftissue source of cell, such as a tumor or cancer cell.

The classification or identification may be performed by the comparisonof gene expression profiles, or patterns, of 50 or more, or optionally 5or more, expressed sequences in a tumor sample to the expression of thesame expressed sequences in a plurality of known tumor types. At leastone of the sequences is expressed in more than one of the known tumortypes in the plurality of known tumor types. Optionally, the range ofexpression levels of the at least one sequence in one known tumor typeoverlaps with the range of expression levels of the same sequence in oneor more other known tumor types in the plurality. In some cases, theoverlap occurs with 5% or more, 10% or more, 15% or more, 20% or more,25% or more, 30% or more, 35% or more, 40% or more, 45% or more, or 50%or more of the other known tumor types in the plurality.

In some embodiments, two or more, three or more, four or more, five ormore, 10 or more, 15 or more, 20 or more, 25 or more, 30 or more, 35 ormore, 40 or more, 45 or more, 50 or more, 55 or more, 60 or more, or amajority of the expressed sequences are expressed in more than one ofthe known tumor types. Optionally, the range of expression levels ofeach of the commonly expressed sequences in one known tumor typeoverlaps with the range of expression levels of the same sequence in oneor more other known tumor types in the plurality as described above.

In some embodiments, the disclosure is used to classify among a group of54 known tumor or cancer types as a plurality. The classification may beperformed with significant accuracy in a clinical setting. Thedisclosure is based in part on the surprising and unexpected discoverythat 50 or more expressed sequences in the human genome are capable ofclassifying among 54 known tumor or cancer types, as well as subsets ofthose tumor types, in a meaningful manner. Additionally, five to 49 ofthe expressed sequences may be used to classify among subsets of the 54known tumor or cancer types.

The disclosure is based in part on the discovery that it is notnecessary to use supervised learning to identify gene sequences whichare expressed in correlation with different tumor types. Thus thedisclosure is based in part on the recognition that the expressionlevels of any 50 or more expressed sequences, even a random collectionof expressed sequences, contains the information content necessary toclassify, and so may be used to classify, a cell as being a tumor cellfrom a plurality or group of known tissues or tissue origins.

In another aspect, the disclosure provides for the classifying of a cellcontaining sample as containing a tumor cell of a tissue type or originby determining the expression levels of 5 or more, or optionally 50 ormore, transcribed sequences and comparing the expression levels to thatof the same transcribed sequences in a plurality or group of known tumortissue types to classify the cell containing sample as containing acancer (or tumor) cell of type among the plurality of cancer (or tumor)types. To classify among 54 known cancer types, and subsets thereof, asfew as any 5 or more, or optionally 50 or more, expressed sequences maybe used for classification in a meaningful manner. The disclosure isalso based in part on the observation that the expressed sequences neednot only be those with expression levels that are evidently or highlycorrelated (directly, or indirectly through correlation with anotherexpressed sequence) with one or more of the known tumor types ascompared to other known tumors. Thus the disclosure provides, in afurther embodiment, for the use of the expression levels of genes thatare not expressed in strong or high correlation with one or more of theknown tumor types for comparison to a tumor or cancer sample. In somecases, all of the genes used for classification may be non-correlates,or only a portion of the genes may be non-correlates. In someembodiments, at least 90%, 85%, 75%, 50% or 25% of the expression levelsused are non-correlated with one or more of the known tumor types.

The disclosure may be practiced by assessing the expression levels ofgene sequences where the sequences need not have been selected based ona correlation of their expression levels with members of a plurality ofknown cancer or tumor types. Thus as a non-limiting example, the genesequences need not be selected based on their correlation values withcancer or tumor types or a ranking based on the correlation values.Additionally, the disclosure may be practice with use of gene expressionlevels which are not necessarily correlated to one or more other geneexpression level(s) used for classification. So in additionalembodiments, the ability for the expression level of one expressedsequence to function in classification is not redundant with (isindependent of) the ability of at least one other gene expression levelused for classification.

The disclosure may be applied to identify the origin of a cancer in apatient in a wide variety of cases including, but not limited to,identification of the origin of a cancer in a clinical setting. In someembodiments, the identification is made by classification of a cellcontaining sample known to contain cancer cells, but the origin of thosecells is unknown. In other embodiments, the identification is made byclassification of a cell containing sample as containing one or morecancer cells followed by identification of the origin(s) of those cancercell(s). In further embodiments, the disclosure is practiced with asample from a subject with a previous history of cancer, andidentification is made by classification of a cell as either beingcancer from a previous origin of cancer or a new origin. Additionalembodiments include those where multiple cancers are present in the sameorgan or tissue, and the disclosure is used to determine the origin ofeach cancer, as well as whether the cancers are of the same origin.

The disclosure is also based in part on the discovery that theexpression levels of particular gene sequences can be used to classifyamong tumor types with greater accuracy than the expression levels of arandom group of gene sequences. In one embodiment, the disclosureprovides for the use of expression levels of a disclosed set ofexpressed sequences in the human genome to classify among known 54cancer types with significant accuracy. The disclosure thus provides forthe identification and use of gene expression patterns (or profiles or“signatures”) based on the expressed sequences as having informationthat may be used to identify the origin of the 54 cancer types. Thedisclosure also provides for the use of expression levels of theseexpressed sequences to classify among subsets of the 54 cancer types.Additionally, the disclosure provides for the use of the expressionlevels of subsets (such as 5 or more) of the disclosed expressedsequences to classify among subsets of the 54 cancer types. Depending onthe number of tumor types, accuracies ranging from over 80% to 100% maybe achieved.

The disclosure is based upon the expression levels of the gene sequencesin a set of known tumor cells from different tissues and of differenttumor types. These gene expression profiles (of gene sequences in thedifferent known tumor cells/types), whether embodied in nucleic acidexpression, protein expression, or other expression formats, may becompared to the expression levels of the same sequences in an unknowntumor sample to identify the sample as containing a tumor of aparticular known type and/or a particular known origin or cell type. Thedisclosure provides, such as in a clinical setting, the advantages of amore accurate identification of a cancer and thus the treatment thereofas well as the prognosis, including survival and/or likelihood of cancerrecurrence following treatment, of the subject from whom the sample wasobtained.

The disclosure is also based in part on the discovery that use of 5 ormore, or optionally 50 or more, expressed sequences as described hereinas capable of classifying among two or more known tumor typesnecessarily and effectively eliminates one or more known tumor typesfrom consideration during classification. This reflects the lack of aneed to select genes with expression levels that are highly correlatedwith all tumor types within the range of the classification system.Stated differently, the disclosure may be practiced with a plurality ofgenes the expression levels of which are not highly correlated with anyof the individual tumor types or multiple types in the group of tumortypes being classified. This is in contrast to other approaches basedupon the selection and use of highly correlated genes, which likely donot “rule out” other tumor types as opposed to “rule in” a tumor typebased on the positive correlation.

The classification of a tumor sample as being one of the possible cancertypes described herein to the exclusion of one or more other cancertypes is of course made based upon a level of confidence as describedbelow. Where the level of confidence is low, or an increase in the levelof confidence is preferred, the classification can simply be made at thelevel of a particular tissue origin or cell type for the cancer in thesample. Alternatively, and where a tumor sample is not readilyclassified as a single tumor type, the disclosure permits theclassification of the sample as one of a few possible cancer typesdescribed herein. This advantageously provides for the ability to reducethe number of possible tissue types, cell types, and tumor types fromwhich to consider for selection and administration of therapy to thepatient from whom the sample was obtained.

The disclosure provides a non-subjective means for the identification ofthe tissue source and/or cancer type of one or more cancers of anafflicted subject. Where subjective interpretation may have beenpreviously used to determine the tissue source and/or cancer type, aswell as the prognosis and/or treatment of the cancer based on thatdetermination, the present disclosure provides objective gene expressionpatterns, which may used alone or in combination with subjectivecriteria to provide a more accurate identification of cancerclassification. The disclosure is particularly advantageously applied tosamples of secondary or metastasized tumors, but any cell containingsample (including a primary tumor sample) for which the tissue sourceand/or tumor type is preferably determined by objective criteria mayalso be used with the disclosure. Of course the ultimate determinationof class may be made based upon a combination of objective andnon-objective (or subjective/partially subjective) criteria.

The disclosure includes its use as part of the clinical or medical careof a patient. Thus in addition to using an expression profile of genesas described herein to assay a cell containing sample from a subjectafflicted with cancer to determine the tissue source and/or tumor typeof the cancer, the profile may also be used as part of a method todetermine the prognosis of the cancer in the subject. The classificationof the tumor/cancer and/or the prognosis may be used to select ordetermine or alter the therapeutic treatment for said subject. Thus theclassification methods of the disclosure may be directed toward thetreatment of disease, which is diagnosed in whole or in part based uponthe classification. Given the diagnosis, administration of anappropriate anti-tumor agent or therapy, or the withholding oralternation of an anti-tumor agent or therapy may be used to treat thecancer.

The details of one or more embodiments of the disclosure are set forthin the accompanying drawing and the description below. Other featuresand advantages of the disclosure will be apparent from the drawing anddetailed description, and from the claims.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 illustrates the range of expression levels of a first transcribedsequence of the disclosure in 39 of the 54 disclosed tumor types.

FIG. 2 illustrates the range of expression levels of a secondtranscribed sequence of the disclosure in 39 of the 54 disclosed tumortypes.

DEFINITIONS

As used herein, a “gene” is a polynucleotide that encodes a discreteproduct, whether RNA or proteinaceous in nature. It is appreciated thatmore than one polynucleotide may be capable of encoding a discreteproduct. The term includes alleles and polymorphisms of a gene thatencodes the same product, or a functionally associated (including gain,loss, or modulation of function) analog thereof, based upon chromosomallocation and ability to recombine during normal mitosis.

A “sequence” or “gene sequence” as used herein is a nucleic acidmolecule or polynucleotide composed of a discrete order of nucleotidebases. The term includes the ordering of bases that encodes a discreteproduct (i.e. “coding region”), whether RNA or proteinaceous in nature.It is appreciated that more than one polynucleotide may be capable ofencoding a discrete product. It is also appreciated that alleles andpolymorphisms of the human gene sequences may exist and may be used inthe practice of the disclosure to identify the expression level(s) ofthe gene sequences or an allele or polymorphism thereof. Identificationof an allele or polymorphism depends in part upon chromosomal locationand ability to recombine during mitosis.

An “expressed sequence” is a sequence that is transcribed by cellularprocesses within a cell. To detect an expressed sequence, a region ofthe sequence that is unique relative to other expressed sequences may beused. An expressed sequence may encode a polypeptide product or not beknown to encode any product. So an expressed sequence may contain openreading frames or no open reading frames. Non-limiting examples includeregions of about 8 or more, about 10 or more, about 12 or more, about 14or more, about 16 or more, about 18 or more, about 20 or more, about 22or more, about 24 or more, about 26 or more, about 28 or more, or about30 or more contiguous nucleotides within an expressed sequence may beused. The term “about” as used in the previous sentence refers to anincrease or decrease of 1 from the stated numerical value. The physicalform of an expressed sequence may be an RNA molecule or thecorresponding cDNA molecule.

The terms “correlate” or “correlation” or equivalents thereof refer toan association between expression of one or more genes and anotherevent, such as, but not limited to, physiological phenotype orcharacteristic, such as tumor type.

A “polynucleotide” is a polymeric form of nucleotides of any length,either ribonucleotides or deoxyribonucleotides. This term refers only tothe primary structure of the molecule. Thus, this term includes double-and single-stranded DNA and RNA. It also includes known types ofmodifications including labels known in the art, methylation, “caps”,substitution of one or more of the naturally occurring nucleotides withan analog, and internucleotide modifications such as uncharged linkages(e.g., phosphorothioates, phosphorodithioates, etc.), as well asunmodified forms of the polynucleotide.

The term “amplify” is used in the broad sense to mean creating anamplification product can be made enzymatically with DNA or RNApolymerases. “Amplification,” as used herein, generally refers to theprocess of producing multiple copies of a desired sequence, particularlythose of a sample. “Multiple copies” mean at least 2 copies. A “copy”does not necessarily mean perfect sequence complementarity or identityto the template sequence. Methods for amplifying mRNA are generallyknown in the art, and include reverse transcription PCR (RT-PCR) andquantitative PCR (or Q-PCR) or real time PCR. Alternatively, RNA may bedirectly labeled as the corresponding cDNA by methods known in the art.

By “corresponding”, it is meant that a nucleic acid molecule shares asubstantial amount of sequence identity with another nucleic acidmolecule. Substantial amount means at least 95%, usually at least 98%and more usually at least 99%, and sequence identity is determined usingthe BLAST algorithm, as described in Altschul et al. (1990), J. Mol.Biol. 215:403-410 (using the published default setting, i.e. parametersw=4, t=17).

A “microarray” is a linear or two-dimensional or three dimensional (andsolid phase) array of discrete regions, each having a defined area,formed on the surface of a solid support such as, but not limited to,glass, plastic, or synthetic membrane. The density of the discreteregions on a microarray is determined by the total numbers ofimmobilized polynucleotides to be detected on the surface of a singlesolid phase support, such as of at least about 50/cm², at least about100/cm², or at least about 500/cm², up to about 1,000/cm² or higher. Thearrays may contain less than about 500, about 1000, about 1500, about2000, about 2500, or about 3000 immobilized polynucleotides in total. Asused herein, a DNA microarray is an array of oligonucleotide orpolynucleotide probes placed on a chip or other surfaces used tohybridize to amplified or cloned polynucleotides from a sample. Sincethe position of each particular group of probes in the array is known,the identities of a sample polynucleotides can be determined based ontheir binding to a particular position in the microarray. As analternative to the use of a microarray, an array of any size may be usedin the practice of the disclosure, including an arrangement of one ormore position of a two-dimensional or three dimensional arrangement in asolid phase to detect expression of a single gene sequence. In someembodiments, a microarray for use with the present disclosure may beprepared by photolithographic techniques (such as synthesis of nucleicacid probes on the surface from the 3′ end) or by nucleic synthesisfollowed by deposition on a solid surface.

Where the disclosure relies upon the identification of gene expression,some embodiments of the disclosure determine expression by hybridizationof mRNA, or an amplified or cloned version thereof, of a sample cell toa polynucleotide that is unique to a particular gene sequence.Polynucleotides of this type contain at least about 16, at least about18, at least about 20, at least about 22, at least about 24, at leastabout 26, at least about 28, at least about 30, or at least about 32consecutive basepairs of a gene sequence that is not found in other genesequences. The term “about” as used in the previous sentence refers toan increase or decrease of 1 from the stated numerical value. Otherembodiments are polynucleotides of at least or about 50, at least orabout 100, at least about or 150, at least or about 200, at least orabout 250, at least or about 300, at least or about 350, at least orabout 400, at least or about 450, or at least or about 500 consecutivebases of a sequence that is not found in other gene sequences. The term“about” as used in the preceding sentence refers to an increase ordecrease of 10% from the stated numerical value. Longer polynucleotidesmay of course contain minor mismatches (e.g. via the presence ofmutations) which do not affect hybridization to the nucleic acids of asample. Such polynucleotides may also be referred to as polynucleotideprobes that are capable of hybridizing to sequences of the genes, orunique portions thereof, described herein. Such polynucleotides may belabeled to assist in their detection. The sequences may be those of mRNAencoded by the genes, the corresponding cDNA to such mRNAs, and/oramplified versions of such sequences. In some embodiments of thedisclosure, the polynucleotide probes are immobilized on an array, othersolid support devices, or in individual spots that localize the probes.

In other embodiments of the disclosure, all or part of a gene sequencemay be amplified and detected by methods such as the polymerase chainreaction (PCR) and variations thereof, such as, but not limited to,quantitative PCR (Q-PCR), reverse transcription PCR (RT-PCR), andreal-time PCR (including as a means of measuring the initial amounts ofmRNA copies for each sequence in a sample), optionally real-time RT-PCRor real-time Q-PCR. Such methods would utilize one or two primers thatare complementary to portions of a gene sequence, where the primers areused to prime nucleic acid synthesis. The newly synthesized nucleicacids are optionally labeled and may be detected directly or byhybridization to a polynucleotide of the disclosure. The newlysynthesized nucleic acids may be contacted with polynucleotides(containing sequences) of the disclosure under conditions which allowfor their hybridization. Additional methods to detect the expression ofexpressed nucleic acids include RNAse protection assays, includingliquid phase hybridizations, and in situ hybridization of cells.

Alternatively, and in further embodiments of the disclosure, geneexpression may be determined by analysis of expressed protein in a cellsample of interest by use of one or more antibodies specific for one ormore epitopes of individual gene products (proteins), or proteolyticfragments thereof, in said cell sample or in a bodily fluid of asubject. The cell sample may be one of breast cancer epithelial cellsenriched from the blood of a subject, such as by use of labeledantibodies against cell surface markers followed by fluorescenceactivated cell sorting (FACS). Such antibodies may be labeled to permittheir detection after binding to the gene product. Detectionmethodologies suitable for use in the practice of the disclosureinclude, but are not limited to, immunohistochemistry of cell containingsamples or tissue, enzyme linked immunosorbent assays (ELISAs) includingantibody sandwich assays of cell containing tissues or blood samples,mass spectroscopy, and immuno-PCR.

The terms “label” or “labeled” refer to a composition capable ofproducing a detectable signal indicative of the presence of the labeledmolecule. Suitable labels include radioisotopes, nucleotidechromophores, enzymes, substrates, fluorescent molecules,chemiluminescent moieties, magnetic particles, bioluminescent moieties,and the like. As such, a label is any composition detectable byspectroscopic, photochemical, biochemical, immunochemical, electrical,optical or chemical means.

The term “support” refers to conventional supports such as beads,particles, dipsticks, fibers, filters, membranes and silane or silicatesupports such as glass slides.

“Expression” and “gene expression” include transcription and/ortranslation of nucleic acid material. Expression levels of an expressedsequence may optionally be normalized by reference or comparison to theexpression level(s) of one or more control expressed genes. These“normalization genes” have expression levels that are relativelyconstant in all members of the plurality or group of known tumor types.

As used herein, the term “comprising” and its cognates are used in theirinclusive sense; that is, equivalent to the term “including” and itscorresponding cognates.

Conditions that “allow” an event to occur or conditions that are“suitable” for an event to occur, such as hybridization, strandextension, and the like, or “suitable” conditions are conditions that donot prevent such events from occurring. Thus, these conditions permit,enhance, facilitate, and/or are conducive to the event. Such conditions,known in the art and described herein, depend upon, for example, thenature of the nucleotide sequence, temperature, and buffer conditions.These conditions also depend on what event is desired, such ashybridization, cleavage, strand extension or transcription.

Sequence “mutation,” as used herein, refers to any sequence alterationin the sequence of a gene disclosed herein interest in comparison to areference sequence. A sequence mutation includes single nucleotidechanges, or alterations of more than one nucleotide in a sequence, dueto mechanisms such as substitution, deletion or insertion. Singlenucleotide polymorphism (SNP) is also a sequence mutation as usedherein. Because the present disclosure is based on the relative level ofgene expression, mutations in non-coding regions of genes as disclosedherein may also be assayed in the practice of the disclosure.

“Detection” or “detecting” includes any means of detecting, includingdirect and indirect determination of the level of gene expression andchanges therein.

Unless defined otherwise all technical and scientific terms used hereinhave the same meaning as commonly understood to one of ordinary skill inthe art to which this disclosure belongs.

DETAILED DESCRIPTION OF MODES OF PRACTICING THE DISCLOSURE

This disclosure provides methods for the use of gene expressioninformation to classify cancers and/or tumors in a more objective mannerthan possible with conventional pathology techniques. The disclosure isbased in part on the results of randomly reducing the number of genesequences used to classify a tumor sample as one of a plurality of tumortypes, such as the tumor types described below and in U.S. PatentPublications US 2006/0094035 and US 2007/0020655. A total number of16,948 genes, which were filtered down from a larger set based uponremoval of genes that display low or constant signals in the samplesused was used for both cross-validation and prediction accuracies asdescribed herein.

Thus in a first aspect, the disclosure provides a method of classifyinga cell containing sample as including a cancer or tumor cell of (orfrom) a type of tissue (or as being of a tissue origin). The methodcomprises determining or measuring the expression levels of 5 or more,or optionally 50 or more, transcribed sequences from cells in a cellcontaining sample obtained from a subject, and classifying the sample ascontaining tumor cells of a type of tissue from a plurality of tumortypes based on the expression levels of said sequences in the cells ofthe sample in comparison to expression levels in known tumors. As usedherein, “a plurality” refers to the state of two or more.

The classifying is based upon a comparison of the expression levels ofthe assayed transcribed sequences in the cells of the sample to theirexpression levels in known tumor samples and/or known non-tumor samples.Alternatively, the classifying is based upon a comparison of theexpression levels of the assayed transcribed sequences to the expressionof reference sequences in the same samples, relative to, or based on,the same comparison in known tumor samples and/or known non-tumorsamples. So as a non-limiting example, the expression levels of the genesequences may be determined in a set of known tumor samples to provide adatabase against which the expression levels detected or determined in acell containing sample from a subject is compared. The expressionlevel(s) of gene sequence(s) in a sample also may be compared to theexpression level(s) of said sequence(s) in normal or non-cancerouscells, preferably from the same sample or subject. As described belowand in embodiments of the disclosure utilizing Q-PCR or real time Q-PCR,the expression levels may be compared to expression levels of referencegenes in the same sample or a ratio of expression levels may be used.

The selection of expressed sequences to use may be random, or byselection based on various criteria. As one non-limiting example, thegene sequences may be selected based upon unsupervised learning,including clustering techniques. As another non-limiting example,selection may be to reduce or remove redundancy with respect to theirability to classify tumor type. For example, gene sequences are selectedbased upon the lack of correlation between their expression and theexpression of one or more other gene sequences used for classifying.This is accomplished by assessing the expression level of each genesequence in the expression data set for correlation, across theplurality of samples, with the expression level of each other gene inthe data set to produce a correlation matrix of correlationcoefficients. These correlation determinations may be performeddirectly, between expression of each pair of gene sequences, orindirectly, without direct comparison between the expression values ofeach pair of gene sequences.

A variety of correlation methodologies may be used in the correlation ofexpression data of individual gene sequences within the data set.Non-limiting examples include parametric and non-parametric methods aswell as methodologies based on mutual information and non-linearapproaches. Non-limiting examples of parametric approaches includePearson correlation (or Pearson r, also referred to as linear orproduct-moment correlation) and cosine correlation. Non-limitingexamples of non-parametric methods include Spearman's R (or rank-order)correlation, Kendall's Tau correlation, and the Gamma statistic. Eachcorrelation methodology can be used to determine the level ofcorrelation between the expressions of individual gene sequences in thedata set. The correlation of all sequences with all other sequences ismost readily considered as a matrix. Using Pearson's correlation as anon-limiting example, the correlation coefficient r in the method isused as the indicator of the level of correlation. When othercorrelation methods are used, the correlation coefficient analogous to rmay be used, along with the recognition of equivalent levels ofcorrelation corresponding to r being at or about 0.25 to being at orabout 0.5.

The correlation coefficient may be selected as desired to reduce thenumber of correlated gene sequences to various numbers. In someembodiments of the disclosure using r, the selected coefficient valuemay be of about 0.25 or higher, about 0.3 or higher, about 0.35 orhigher, about 0.4 or higher, about 0.45 or higher, or about 0.5 orhigher. The selection of a coefficient value means that where expressionbetween gene sequences in the data set is correlated at that value orhigher, they are possibly not included in a subset of the disclosure.Thus in some embodiments, the method comprises excluding or removing(not using for classification) one or more gene sequences that areexpressed in correlation, above a desired correlation coefficient, withanother gene sequence in the tumor type data set. It is pointed out,however, that there can be situations of gene sequences that are notcorrelated with any other gene sequences, in which case they are notnecessarily removed from use in classification.

Thus the expression levels of gene sequences, where more than about 10%,more than about 20%, more than about 30%, more than about 40%, more thanabout 50%, more than about 60%, more than about 70%, more than about80%, or more than about 90% of the levels are not correlated with thatof another one of the gene sequences used, may be used in the practiceof the disclosure. Correlation between expression levels may be basedupon a value below about 0.9, about 0.8, about 0.7, about 0.6, about0.5, about 0.4, about 0.3, or about 0.2. The ability to classify amongclasses with exclusion of the expression levels of some gene sequencesis present because expression of the gene sequences in the subset iscorrelated with expression of the gene sequences excluded from thesubset. So no information was lost because information based on theexpression of the excluded gene sequences is still represented bysequences retained in the subset. Therefore, expression of the genesequences of the subset has information content relevant to propertiesand/or characteristics (or phenotype) of a cell. This has applicationand relevance to the classification of additional tumor type classes notincluded as part of the original gene expression data set which can beclassified by use of a subset of the disclosure because based on theredundancy of information between expression of sequences in the subsetand sequences expressed in those additional classes. Thus the disclosuremay be used to classify cells as being a tumor type beyond the pluralityof known classes used to generate the original gene expression data set.

Selection of gene sequences based upon reducing correlation ofexpression to a particular tumor type may also be used. This alsoreflects a discovery of the present disclosure, based upon theobservation that expression levels that were most highly correlated withone or more tumor types was not necessarily of greatest value inclassification among different tumor types. This is reflected both bythe ability to use randomly selected gene sequences for classificationas well as the use of particular sequences, as described herein, whichare not expressed with the most significant correlation with one or moretumor types. Thus the disclosure may be practiced without selection ofgene sequences based upon the most significant P values or a rankingbased upon correlation of gene expression and one or more tumor types.Thus the disclosure may be practiced without the use of ranking basedmethodologies, such as the Kruskal-Wallis H-test.

The gene sequences used in the practice of the disclosure may includethose which have been observed to be expressed in correlation withparticular known tumor types, such as expression of the estrogenreceptor, which has been observed to be expressed in correlation withsome breast and ovarian cancers. In some embodiments of the disclosure,however, the disclosure is practiced with use of expression levels ofmultiple gene sequences where the expression levels overlap in two ormore of the known tumor types. In some cases, one or more of thetranscribed sequences have expression levels that overlap in all of theknown tumor types, at least 50 of the known tumor types, at least 45 ofthe known tumor types, at least 40 of the known tumor types, at least 35of the known tumor types, at least 30 of the known tumor types, at least25 of the known tumor types, at least 20 of the known tumor types, atleast 10 of the known tumor types, or at least 5 of the known tumortypes of a disclosed group.

Used in the practice of the disclosed methods, the number of transcribedsequences that are expressed with a range of overlapping levels in twoor more known tumor types of a disclosed group may be 2 or more, 3 ormore, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more,10 or more, 11 or more, 12 or more, 13 or more, 14 or more, 15 or more,16 or more, 17 or more, 18 or more, 19 or more, 20 or more, 22 or more,24 or more, 26 or more, 28 or more, 30 or more, 32 or more, 34 or more,36 or more, 38 or more, 40 or more, 42 or more, 44 or more, 46 or more,48 or more, 50 or more, 52 or more, 54 or more, 56 or more, 58 or more,60 or more, 62 or more, 64 or more, 66 or more, 68 or more, 70 or more,72 or more, 74 or more, 76 or more, 78 or more, 80 or more, 92 or more,94 or more, 96 or more, 98 or more, 100 or more, 105 or more, 110 ormore, 120 or more, 130 or more, 140 or more, or 150 or more. Of coursethe above values may be used with one or more transcribed sequences withexpression levels that do not overlap between two, or more, members of agroup of known tumor types. Based upon the number of known tumor typesin a plurality, skilled person may practice the disclosure anyappropriate combination of a number of tumor types with overlappingexpression of the sample transcribed sequences and a number oftranscribed sequences with overlapping expression levels in two or moretumor types.

While the disclosure is described mainly with respect to human subjects,samples from other subjects may also be used. All that is necessary isthe ability to assess the expression levels of gene sequences in aplurality of known tumor samples such that the expression levels in anunknown or test sample may be compared. Thus the disclosure may beapplied to samples from any organism for which a plurality of expressedsequences, and a plurality of known tumor samples, are available. Onenon-limiting example is application of the disclosure to mouse samples,based upon the availability of the mouse genome to permit detection ofexpressed murine sequences and the availability of known mouse tumorsamples or the ability to obtain known samples. Thus, the disclosure iscontemplated for use with other samples, including those of mammals,primates, and animals used in clinical testing (such as rats, mice,rabbits, dogs, cats, and chimpanzees) as non-limiting examples.

While the disclosure is readily practiced with the use of cellcontaining samples, any nucleic acid containing sample which may beassayed for gene expression levels may be used in the practice of thedisclosure. Without limiting the disclosure, a sample of the disclosuremay be one that is suspected or known to contain tumor cells.Alternatively, a sample of the disclosure may be a “tumor sample” or“tumor containing sample” or “tumor cell containing sample” of tissue orfluid isolated from an individual suspected of being afflicted with, orat risk of developing, cancer. Non-limiting examples of samples for usewith the disclosure include a clinical sample, such as, but not limitedto, a fixed sample, a fresh sample, or a frozen sample. The sample maybe an aspirate, a cytological sample (including blood or other bodilyfluid, including fluid from an ascites or a pleural cavity), or a tissuespecimen, which includes at least some information regarding the in situcontext of cells in the specimen, so long as appropriate cells ornucleic acids are available for determination of gene expression levels.The disclosure is based in part on the discovery that results obtainedwith frozen tissue sections can be validly applied to the situation withfixed tissue or cell samples and extended to fresh samples.

Non-limiting examples of fixed samples include those that are fixed withformalin or formaldehyde (including FFPE samples), with Boudin's,glutaldehyde, acetone, alcohols, or any other fixative, such as thoseused to fix cell or tissue samples for immunohistochemistry (IHC). Otherexamples include fixatives that precipitate cell associated nucleicacids and proteins. Given possible complications in handling frozentissue specimens, such as the need to maintain its frozen state, thedisclosure may be practiced with non-frozen samples, such as fixedsamples, fresh samples, including cells from blood or other bodily fluidor tissue, and minimally treated samples. In some applications of thedisclosure, the sample has not been classified using standard pathologytechniques, such as, but not limited to, immunohistochemistry basedassays.

In some embodiments of the disclosure, the sample is classified ascontaining a tumor cell of a type selected from the following 54, andsubsets thereof: adrenal-cortical tumor, adrenal pheochromocytoma, tumorof the brain, adenocarcinoma of breast, cervical adenocarcinoma,cervical squamous cell carcinoma, cholangiocarcinoma, endometrialadenocarcinoma, esophageal squamous cell carcinoma, gastrointestinalstromal tumor, adenocarcinoma of gallbladder, gastro-esophagealadenocarcinoma, seminomatous germ cell tumor, nonseminomatous germ celltumor, tumor of the salivary gland, squamous cell carcinoma, colorectaladenocarcinoma, small intestine adenocarcinoma, clear cell renal cellcarcinoma, chromophobe renal cell carcinoma, papillary renal cellcarcinoma, hepatocellular carcinoma, lung adenocarcinoma, lung squamouscell carcinoma, lymphoma, melanoma, meningioma, mesothelioma,small/large cell neuroendocrine lung cancer, neuroendocrine-pancreascancer, merkel cell carcinoma, gastrointestinal carcinoid, lungcarcinoid, clear cell adenocarinoma, endometrioid adenocarcinoma,mucinous adenocarcinoma, serous adenocarcinoma, pancreaticadenocarcinoma, prostate adenocarcinoma, malignant fibrous histiocytoma,primitive neuroectodermal tumor, leiomyosarcoma, liposarcoma,osteosarcoma, synovial sarcoma, sex cord stromal tumor, basal cellcarcinoma, skin squamous cell carcinoma, thymic carcinoma/thymoma,follicular/papillary carcinoma, medullary carcinoma, transitional cellcarcinoma, adenocarcinoma of bladder, and squamous cell carcinoma ofbladder.

These 54 tumor types correspond to the following tissue types: adrenaltissue (adrenal-cortical tumor and adrenal pheochromocytoma), braintissue (tumor of the brain), breast tissue (adenocarcinoma of breast),cervical tissue (cervical adenocarcinoma and cervical squamous cellcarcinoma), bile duct tissue (cholangiocarcinoma), endometrial tissue(endometrial adenocarcinoma), esophageal tissue (esophageal squamouscell carcinoma), gastrointestinal tissue (gastrointestinal stromal tumoror GIST), gall bladder tissue (adenocarcinoma of gall bladder),gastro-esophageal tissue (gastro-esophageal adenocarcinoma), germ celltissue (seminomatous germ cell tumor and nonseminomatous germ celltumor), head and neck tissue (tumor of the salivary gland and squamouscell carcinoma), intestinal tissue (colorectal adenocarcinoma and smallintestine adenocarcinoma), kidney tissue (clear cell renal cellcarcinoma, chromophobe renal cell carcinoma, and papillary renal cellcarcinoma), liver tissue (hepatocellular carcinoma), lung tissue (lungadenocarcinoma and lung squamous cell carcinoma), lymphocytes(lymphoma), melanocytes (melanoma), meningeal tissue (meningioma),tissue of a mesothelium (mesothelioma), neuroendocrine tissue(small/large cell neuroendocrine lung cancer, neuroendocrine-pancreascancer, merkel cell carcinoma, gastrointestinal carcinoid, and lungcarcinoid), ovary tissue (clear cell adenocarcinoma, endometrioidadenocarcinoma, mucinous adenocarcinoma, and serous adenocarcinoma),tissue of the pancreas (pancreatic adenocarcinoma), prostate tissue(prostate adenocarcinoma), tissue of a sarcoma (malignant fibroushistiocytoma or MFH, primitive neuroectodermal tumor or PNET,leiomyosarcoma, liposarcoma, osteosarcoma, and synovial sarcoma), sexcord stromal tissue (sex cord stromal tumor), skin tissue (basal cellcarcinoma and skin squamous cell carcinoma), thymus tissue (thymiccarcinoma/thymoma), thyroid tissue (follicular/papillary carcinoma andmedullary carcinoma), and urinary bladder tissue (transitional cellcarcinoma, adenocarcinoma of bladder, and squamous cell carcinoma ofbladder).

The methods of the disclosure may also be applied to classify a cellcontaining sample as containing a tumor cell of a tumor of a subset ofany of the above-listed types. The size of the subset may be small,composed of two, three, four, five, six, seven, eight, nine, or ten ofthe tumor types described above. Alternatively, the size of the subsetmay be any integral number up to the full size of the set. Thusembodiments of the disclosure include classification among 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,50, 51, 52, 53, or 54 of the above types. In some embodiments, thesubset will be composed of tumor types that are of the same tissue ororgan type. Alternatively, the subset will be composed of tumor types ofdifferent tissues or organs. It cannot be overemphasized that thedisclosure is not based upon any particular combination of tumor typesand that all possible combinations of the above 54 known types areexpressly contemplated as embodiments of the disclosure. The finitepossibility of explicitly writing out all combinations of the 54 tumortypes would be to require an arbitrary form over the substance of thediscovery and disclosure.

While classification among some subsets of the above tumor types hasbeen reported in U.S. Patent Publications US 2006/0094035 and US2007/0020655 as well as Ma et al. (Arch. Pathol. Lab. Med., 130:465-473,2006), it is believed that the instant disclosure is the first regardingsuccessful identification of at least adrenal-cortical tumor, tumor ofthe salivary gland, squamous cell carcinoma, neuroendocrine-pancreascancer, merkel cell carcinoma, lung carcinoid, primitive neuroectodermaltumor, sex cord stromal tumor, thymic carcinoma/thymoma, adenocarcinomaof urinary bladder, and squamous cell carcinoma of urinary bladder.Therefore, and in some embodiments, a group of known tumor types wouldinclude one or more types from this list.

The disclosure may be practiced with the expression levels of about 10or more, about 15 or more, about 20 or more, about 25 or more, about 30or more, about 35 or more, about 40 or more, about 45 or more, or about50 or more transcribed sequences as found in the human “transcriptome”(transcribed portion of the genome). In some embodiments of thedisclosure, the transcribed genes may be randomly picked or include allor some of the specific genes sequences disclosed herein. Classificationwith accuracies of about 55%, about 60%, about 65%, about 70%, about75%, about 80%, about 85%, about 90%, or about 95% or higher can beperformed by use of the instant disclosure.

In other embodiments, the gene expression levels of other gene sequencesmay be determined along with the above described determinations ofexpression levels for use in classification. One non-limiting example ofthis is seen in the case of a microarray based platform to determinegene expression, where the expression of other gene sequences is alsomeasured. Where those other expression levels are not used in comparisonto expression in known tumor types, they may be considered the resultsof “excess” transcribed sequences and not critical to the practice ofthe disclosure. Alternatively, and where those other expression levelsare used in classification, they are within the scope of the disclosure,where the description of using particular numbers of sequences does notnecessarily exclude the use of expression levels of additionalsequences. In some embodiments, the disclosure includes the use ofexpression level(s) from one or more “excess” gene sequences, such asthose which may provide information redundant to one or more other genesequences used in a method of the disclosure.

Because classification of a sample as containing cells of one of theabove tumor types inherently also classifies the tissue or organ siteorigin of the sample, the methods of the disclosure may be applied toclassification of a tumor sample as being of a particular tissue ororgan site of a subject from which the sample was obtained. Thisapplication of the disclosure is particularly useful in cases where thesample is of a tumor that is the result of metastasis by another tumor.In some embodiments of the disclosure, the tumor sample is classified asbeing one of the following 30 known tissue types: adrenal tissue, braintissue, breast tissue, cervical tissue, bile duct tissue, endometrialtissue, esophageal tissue, gastrointestinal tissue, gall bladder tissue,gastro-esophageal tissue, germ cell tissue, head and neck tissue,intestinal tissue, kidney tissue, liver tissue, lung tissue,lymphocytes, melanocytes, meningeal tissue, tissue of a mesothelium,neuroendocrine tissue, ovary tissue, tissue of the pancreas, prostatetissue, tissue of a sarcoma, sex cord stromal tissue, skin tissue,thymus tissue, thyroid tissue, and urinary bladder tissue.

The classification of a cell containing sample as having a tumor cell ofone of the disclosed 54 tumor types above inherently also classifies thetissue or organ site origin of the sample. For example, theidentification of a sample as being cervical squamous cell carcinomanecessarily classifies the tumor as being of cervical origin, squamouscell type (and thus epithelial rather than non-epithelial in origin). Italso means that the tumor was necessarily not germ cell in origin. Thus,the methods of the disclosure may be applied to classification of atumor sample as being of a particular tissue or organ site of a subjector patient. This application of the disclosure is particularly useful incases where the sample is of a tumor that is the result of metastasis byanother tumor.

The practice of the disclosure to classify a cell containing sample ashaving a tumor cell of one of the above types is by use of anappropriate classification algorithm that utilizes supervised learningto accept 1) the levels of expression of the gene sequences in aplurality of known tumor types as a training set and 2) the levels ofexpression of the same genes in one or more cells of a sample toclassify the sample as having cells of one of the tumor types. Suchalgorithms are known to the skilled person and have been describedelsewhere. The levels of expression may be provided based upon thesignals in any format, including nucleic acid expression or proteinexpression as described herein.

Embodiments of the disclosure include use of the methods and materialsdescribed herein to identify the origin of a cancer from a patient. Thusgiven a sample containing tumor cells, the tissue origin of the tumorcells is identified by use of the present disclosure. One non-limitingexample is in the case of a subject with an inflamed lymph nodecontaining cancer cells. The cells may be from a tissue or organ thatdrains into the lymph node or it may be from another tissue source. Thepresent disclosure may be used to classify the cells as being of aparticular tumor or tissue type (or origin) which allows theidentification of the source of the cancer cells. In an alternativenon-limiting example, the sample (such as that from a lymph node)contains cells, which are first assayed by use of the disclosure toclassify at least one cell as being a tumor cell of a tissue type ororigin. This is then used to identify the source of the cancer cells inthe sample. Both of these are examples of the advantageous use of thedisclosure to save time, effort, and cost in the use of other cancerdiagnostic tests.

In further embodiments, the disclosure is practiced with a sample from asubject with a previous history of cancer. As a non-limiting example, acell containing sample (from the lymph node or elsewhere) of the subjectmay be found to contain cancer cells such that the present disclosuremay be used to determine whether the cells are from the same or adifferent tissue from that of the previous cancer. This application ofthe disclosure may also be used to identify a new primary tumor, such asthe case where new cancer cells are found in the liver of a subject whopreviously had breast cancer. The disclosure may be used to identify thenew cancer cells as being the result of metastasis from the previousbreast cancer (or from another tumor type, whether previously identifiedor not) or as a new primary occurrence of liver cancer. The disclosuremay also be applied to samples of a tissue or organ where multiplecancers are found to determine the origin of each cancer, as well aswhether the cancers are of the same origin.

While the disclosure may be practiced with the use of expression levelsof a random group of expressed gene sequences, the disclosure alsoprovides exemplary gene sequences for use in the practice of thedisclosure. The disclosure includes a group of 87 gene sequences fromwhich 5 or more may be used in the practice of the disclosure. The genesequences may be used along with the determination of expression levelsof additional sequences so long as the expression levels of genesequences from the set of 87 are used in classifying. A non-limitingexample of such embodiments of the disclosure is where the expression offrom 5 or more of the 87 gene sequences is measured along with theexpression levels of a plurality of other sequences, such as by use of amicroarray based platform used to perform a disclosed method. Wherethose other expression levels are not used in classification, they maybe considered the results of “excess” transcribed sequences and optionalto the practice of the disclosure. Alternatively, and where those otherexpression levels are used in classification, they are within the scopeof the disclosure, where the use of the above described sequences doesnot necessarily exclude the use of expression levels of additionalsequences.

Representative, and non-limiting, mRNA sequences corresponding to a setof 87 gene sequences for use in the practice of the disclosure have beenpreviously reported in U.S. Patent Publications US 2006/0094035 and US2007/0020655. The listing of identifying information, includingaccession numbers, Gene Symbols, and Description, is provided by thefollowing table, where ATF indicates ascites tumor fluid, EGF isepidermal growth factor, and CLL is chronic lymphatic leukemia:

Accession Gene Symbol Description* AA456140 PANX3 Pannexin 3 AA745593BATF Basic leucine zipper transcription factor, ATF-like AA765597 SPRED2Sprouty-related, EVH1 domain containing 2 AA782845 SLC35F3 Solutecarrier family 35, member F3 AA865917 Hypothetical LOC389142 AA946776FGF9 Fibroblast growth factor 9 (glia-activating factor) AA993639FLJ10748 Hypothetical protein FLJ10748 AB038160 TMPRSS3 Transmembraneprotease, serine 3 AF104032 SLC7A5 Solute carrier family 7 (cationicamino acid transporter, y+ system), member 5 AF133587 RTDR1 Rhabdoidtumor deletion region gene 1 AF301598 EMX2 Empty spiracles homolog 2(Drosophila) AF332224 CYorf15A Chromosome Y open reading frame 15AAI041545 KDELR2 KDEL (Lys-Asp-Glu-Leu) endoplasmic reticulum proteinretention receptor 2 AI147926 CSF2RB Colony-stimulating factor 2receptor, beta, low-affinity (granulocyte-macrophage) AI309080 KCNJ11Potassium inwardly rectifying channel, subfamily J, member 11 AI341378CPEB2 Cytoplasmic polyadenylation element binding protein 2 AI457360ERN2 Endoplasmic reticulum to nucleus signalling 2 AI620495 MEIS1 Meis1,myeloid ecotropic viral integration site 1 homolog (mouse) AI632869UPK1B Uroplakin 1B AI683181 PRDM6 PR domain containing 6 AI685931 KIBRAKIBRA protein AI802118 SLC6A13 Solute carrier family 6 (neurotransmittertransporter, GABA), member 13 AI804745 AI952953 AI985118 C14orf105Chromosome 14 open reading frame 105 AJ000388 CAPN6 Calpain 6 AK025181LOC91464 RAX-like homeobox AK027147 TITF1 Hypothetical protein LOC253970AK054605 FLJ11539 Hypothetical protein FLJ11539 AL023657 SH2D1A SH2domain protein 1A, Duncan disease (lymphoproliferative syndrome)AL039118 FOXG1B Forkhead box G1A AL110274 AL157475 C8orf13 Chromosome 8open reading frame 13 AW118445 CELSR2 Cadherin, EGF LAG seven-passG-type receptor 2 (flamingo homolog, Drosophila) AW194680 HOXD11Homeobox D11 AW291189 Hypothetical LOC388416 AW298545 KIAA1904 KIAA1904protein AW445220 LY6K Lymphocyte antigen 6 complex, locus K AW473119ESR1 Estrogen receptor 1 AY033998 ELAVL4 ELAV (embryonic lethal,abnormal vision, Drosophila)-like 4 (Hu antigen D) BC000045 VGLL1Vestigial like 1 (Drosophila) BC001293 HOXC10 Homeobox C10 BC001504PYCR1 Pyrroline-5-carboxylate reductase 1 BC001639 SLC43A1 Solutecarrier family 43, member 1 BC002551 CDCA3 Cell division cycleassociated 3 BC004331 HSDL2 Hydroxysteroid dehydrogenase like 2 BC004453HTR3A 5-hydroxytryptamine (serotonin) receptor 3A BC005364 C10orf59Chromosome 10 open reading frame 59 BC006537 HOXA9 Homeobox A9 BC006881PPARG Peroxisome proliferative activated receptor, gamma BC006819 S100PS100 calcium binding protein P BC008764 KIF2C Kinesin family member 2CBC008765 SDC1 Syndecan 1 BC009084 SELENBP1 Selenium binding protein 1BC009237 TSHR Thyroid-stimulating hormone receptor BC010626 KIF12Kinesin family member 12 BC011949 CA2 Carbonic anhydrase II BC012926EPS8L3 EPS8-like 3 BC013117 RGS17 Regulator of G-protein signalling 17BC015754 CADPS Ca²⁺-dependent secretion activator BC017586 MGC26610Calcyphosine-like BE552004 CDNA FLJ44317 fis, clone TRACH3000586BE962007 COX11 COX11 homolog, cytochrome c oxidase assembly protein(yeast) BF224381 Hypothetical LOC400951 BF437393 BF446419 PCANAP6Prostate cancer-associated protein 6 BF592799 PRKCQ Protein kinase C,theta BI493248 IBSP Integrin-binding sialoprotein (bone sialoprotein,bone sialoprotein II) H05388 ZNF365 Hypothetical protein LOC283045H07885 Transcribed locus H09748 BCL11B B-cell CLL/lymphoma 11B (zincfinger protein) M95585 HLF Hepatic leukemia factor N64339 GJB6 Gapjunction protein, beta 6 (connexin 30) NM_000065 C6 Complement component6 NM_001337 CX3CR1 Chemokine (C-X3-C motif) receptor 1 NM_003914 CCNA1Cyclin A1 NM_004062 CDH16 Cadherin 16, KSP-cadherin NM_004063 CDH17Cadherin 17, LI cadherin (liver-intestine) NM_004496 FOXA1 Forkhead boxA1 NM_006115 PRAME Preferentially expressed antigen in melanomaNM_019894 TMPRSS4 Transmembrane protease, serine 4 NM_033229 TRIM15Tripartite motif-containing 15 R15881 CHRM3 Cholinergic receptor,muscarinic 3 R45389 CDNA clone IMAGE: 4797120 R61469 Transcribed locus,moderately similar to NP_775622.1 hypothetical protein LOC270028 [Musmusculus] X69699 PAX8 Paired box gene 8 X96757 MAP2K6 Mitogen-activatedprotein kinase kinase 6

As would be understood by the skilled person, detection of expression ofany of the above identified sequences may be performed by the detectionof expression of any appropriate portion or fragment of these sequences.Preferably, the portions are sufficiently large to contain uniquesequences relative to other sequences expressed in a cell containingsample. Moreover, the skilled person would recognize that the disclosedsequences represent one strand of a double stranded molecule and thateither strand may be detected as an indicator of expression of thedisclosed sequences. This is because the disclosed sequences areexpressed as RNA molecules in cells which are preferably converted tocDNA molecules for ease of manipulation and detection. The resultantcDNA molecules may have the sequences of the expressed RNA as well asthose of the complementary strand thereto. Thus either the RNA sequencestrand or the complementary strand may be detected. Of course is it alsopossible to detect the expressed RNA without conversion to cDNA.

In some embodiments of the disclosure, the expression levels of genesequences is measured by detection of expressed sequences in a cellcontaining sample as hybridizing to oligonucleotides of the disclosedgene sequences as indicated by the accession numbers provided.

In additional embodiments, the disclosure provides for use of any numberof the gene sequences of the set of 87 in the methods of the disclosure.Thus any integral number from 1 to all of the 87 gene sequences may beused in the practice of the disclosure.

As used herein, a “tumor sample” or “tumor containing sample” or “tumorcell containing sample” or variations thereof, refer to cell containingsamples of tissue or fluid isolated from an individual suspected ofbeing afflicted with, or at risk of developing, cancer. The samples maycontain tumor cells which may be isolated by known methods or otherappropriate methods as deemed desirable by the skilled practitioner.These include, but are not limited to, microdissection, laser capturemicrodissection (LCM), or laser microdissection (LMD) before use in theinstant disclosure. Alternatively, undissected cells within a “section”of tissue may be used. Non-limiting examples of such samples includeprimary isolates (in contrast to cultured cells) and may be collected byany non-invasive or minimally invasive means, including, but not limitedto, ductal lavage, fine needle aspiration, needle biopsy, the devicesand methods described in U.S. Pat. No. 6,328,709, or any other suitablemeans recognized in the art. Alternatively, the sample may be collectedby an invasive method, including, but not limited to, surgical biopsy.

The detection and measurement of transcribed sequences may beaccomplished by a variety of means known in the art or as deemedappropriate by the skilled practitioner. Essentially, any assay methodmay be used as long as the assay reflects, quantitatively orqualitatively, the level of expression of the transcribed sequence beingdetected.

The ability to classify tumor samples is provided by the recognition ofthe relevance of the level of expression of the gene sequences (whetherrandomly selected or specific) and not by the form of the assay used todetermine the actual level of expression. An assay of the disclosure mayutilize any identifying feature of a individual gene sequence asdisclosed herein as long as the assay reflects, quantitatively orqualitatively, expression of the gene in the “transcriptome” (thetranscribed fraction of genes in a genome) or the “proteome” (thetranslated fraction of expressed genes in a genome). Additional assaysinclude those based on the detection of polypeptide fragments of therelevant member or members of the proteome. Non-limiting examples of thelatter include detection of proteolytic fragments found in a biologicalfluid, such as blood or serum. Identifying features include, but are notlimited to, unique nucleic acid sequences used to encode (DNA), orexpress (RNA), said gene or epitopes specific to, or activities of, aprotein encoded by a gene sequence.

Additional means include detection of nucleic acid amplification asindicative of increased expression levels and nucleic acid inactivation,deletion, or methylation, as indicative of decreased expression levels.Stated differently, the disclosure may be practiced by assaying one ormore aspect of the DNA template(s) underlying the expression of eachgene sequence, of the RNA used as an intermediate to express thesequence, or of the proteinaceous product expressed by the sequence, aswell as proteolytic fragments of such products. As such, the detectionof the presence of, amount of, stability of, or degradation (includingrate) of, such DNA, RNA and proteinaceous molecules may be used in thepractice of the disclosure.

In some embodiments, all or part of a gene sequence may be amplified anddetected by methods such as the polymerase chain reaction (PCR) andvariations thereof, such as, but not limited to, quantitative PCR(Q-PCR), reverse transcription PCR (RT-PCR), and real-time PCR(including as a means of measuring the initial amounts of mRNA copiesfor each sequence in a sample), optionally real-time RT-PCR or real-timeQ-PCR. Such methods would utilize one or two primers that arecomplementary to portions of a gene sequence, where the primers are usedto prime nucleic acid synthesis. The newly synthesized nucleic acids areoptionally labeled and may be detected directly or by hybridization to apolynucleotide of the disclosure. The newly synthesized nucleic acidsmay be contacted with polynucleotides (containing gene sequences) of thedisclosure under conditions which allow for their hybridization.Additional methods to detect the expression of expressed nucleic acidsinclude RNAse protection assays, including liquid phase hybridizations,and in situ hybridization of cells.

Alternatively, the expression of gene sequences in FFPE samples may bedetected as disclosed in U.S. Pat. No. 7,364,846 B2 (which is herebyincorporated by reference as if fully set forth). Briefly, theexpression of all or part of an expressed gene sequence or transcriptmay be detected by use of hybridization mediated detection (such as, butnot limited to, microarray, bead, or particle based technology) orquantitative PCR mediated detection (such as, but not limited to, realtime PCR and reverse transcriptase PCR) as non-limiting examples. Theexpression of all or part of an expressed polypeptide may be detected byuse of immunohistochemistry techniques or other antibody mediateddetection (such as, but not limited to, use of labeled antibodies thatbind specifically to at least part of the polypeptide relative to otherpolypeptides) as non-limiting examples. Additional means for analysis ofgene expression are available, including detection of expression withinan assay for global, or near global, gene expression in a sample (e.g.as part of a gene expression profiling analysis such as on amicroarray).

In embodiments using a nucleic acid based assay to determine expressionincludes immobilization of one or more gene sequences on a solidsupport, including, but not limited to, a solid substrate as an array orto beads or bead based technology as known in the art. Alternatively,solution based expression assays known in the art may also be used. Theimmobilized gene sequence(s) may be in the form of polynucleotides thatare unique or otherwise specific to the gene(s) such that thepolynucleotides would be capable of hybridizing to the DNA or RNA ofsaid gene(s). These polynucleotides may be the full length of thegene(s) or be short sequences of the genes (up to one nucleotide shorterthan the full length sequence known in the art by deletion from the 5′or 3′ end of the sequence) that are optionally minimally interrupted(such as by mismatches or inserted non-complementary basepairs) suchthat hybridization with a DNA or RNA corresponding to the genes is notaffected. In some embodiments, the polynucleotides used are from the 3′end of the gene, such as within about 350, about 300, about 250, about200, about 150, about 100, or about 50 nucleotides from thepolyadenylation signal or polyadenylation site of a gene or expressedsequence. Polynucleotides containing mutations relative to the sequencesof the disclosed genes may also be used so long as the presence of themutations still allows hybridization to produce a detectable signal.Thus the practice of the present disclosure is unaffected by thepresence of minor mismatches between the disclosed sequences and thoseexpressed by cells of a subject's sample. A non-limiting example of theexistence of such mismatches are seen in cases of sequence polymorphismsbetween individuals of a species, such as individual human patientswithin Homo sapiens.

As known by those skilled in the art, some gene sequences include 3′poly A (or poly T on the complementary strand) stretches that do notcontribute to the uniqueness of the disclosed sequences. The disclosuremay thus be practiced with gene sequences lacking the 3′ poly A (or polyT) stretches. The uniqueness of the disclosed sequences refers to theportions or entireties of the sequences which are found only in nucleicacids, including unique sequences found at the 3′ untranslated portionthereof. Some unique sequences for the practice of the disclosure arethose which contribute to the consensus sequences for the genes suchthat the unique sequences will be useful in detecting expression in avariety of individuals rather than being specific for a polymorphismpresent in some individuals. Alternatively, sequences unique to anindividual or a subpopulation may be used. The unique sequences may bethe lengths of polynucleotides of the disclosure as described herein.

In additional embodiments of the disclosure, polynucleotides havingsequences present in the 3′ untranslated and/or non-coding regions ofgene sequences are used to detect expression levels in cell containingsamples of the disclosure. Such polynucleotides may optionally containsequences found in the 3′ portions of the coding regions of genesequences. Polynucleotides containing a combination of sequences fromthe coding and 3′ non-coding regions preferably have the sequencesarranged contiguously, with no intervening heterologous sequence(s).

Alternatively, the disclosure may be practiced with polynucleotideshaving sequences present in the 5′ untranslated and/or non-codingregions of gene sequences to detect the level of expression in cells andsamples of the disclosure. Such polynucleotides may optionally containsequences found in the 5′ portions of the coding regions.Polynucleotides containing a combination of sequences from the codingand 5′ non-coding regions may have the sequences arranged contiguously,with no intervening heterologous sequence(s). The disclosure may also bepracticed with sequences present in the coding regions of genesequences.

The polynucleotides of some embodiments contain sequences from 3′ or 5′untranslated and/or non-coding regions of at least about 16, at leastabout 18, at least about 20, at least about 22, at least about 24, atleast about 26, at least about 28, at least about 30, at least about 32,at least about 34, at least about 36, at least about 38, at least about40, at least about 42, at least about 44, or at least about 46consecutive nucleotides. The term “about” as used in the previoussentence refers to an increase or decrease of 1 from the statednumerical value. Other embodiments use polynucleotides containingsequences of at least or about 50, at least or about 100, at least aboutor 150, at least or about 200, at least or about 250, at least or about300, at least or about 350, or at least or about 400 consecutivenucleotides. The term “about” as used in the preceding sentence refersto an increase or decrease of 10% from the stated numerical value.

Sequences from the 3′ or 5′ end of gene coding regions as found inpolynucleotides of the disclosure are of the same lengths as thosedescribed above, except that they would naturally be limited by thelength of the coding region. The 3′ end of a coding region may includesequences up to the 3′ half of the coding region. Conversely, the 5′ endof a coding region may include sequences up the 5′ half of the codingregion. Of course the above described sequences, or the coding regionsand polynucleotides containing portions thereof, may be used in theirentireties.

In another embodiment of the disclosure, polynucleotides containingdeletions of nucleotides from the 5′ and/or 3′ end of gene sequences maybe used. The deletions are preferably of 1-5, 5-10, 10-15, 15-20, 20-25,25-30, 30-35, 35-40, 40-45, 45-50, 50-60, 60-70, 70-80, 80-90, 90-100,100-125, 125-150, 150-175, or 175-200 nucleotides from the 5′ and/or 3′end, although the extent of the deletions would naturally be limited bythe length of the sequences and the need to be able to use thepolynucleotides for the detection of expression levels.

Other polynucleotides of the disclosure from the 3′ end of genesequences include those of primers and optional probes for quantitativePCR. Preferably, the primers and probes are those which amplify a regionless than about 750, less than about 700, less than about 650, less thanabout 6000, less than about 550, less than about 500, less than about450, less than about 400, less than about 350, less than about 300, lessthan about 250, less than about 200, less than about 150, less thanabout 100, or less than about 50 nucleotides from the from thepolyadenylation signal or polyadenylation site of a gene or expressedsequence. The size of a PCR amplicon of the disclosure may be of anysize, including at least or about 50, at least or about 100, at leastabout or 150, at least or about 200, at least or about 250, at least orabout 300, at least or about 350, or at least or about 400 consecutivenucleotides, all with inclusion of the portion complementary to the PCRprimers used.

Other polynucleotides for use in the practice of the disclosure includethose that have sufficient homology to gene sequences to detect theirexpression by use of hybridization techniques. Such polynucleotidespreferably have about or 95%, about or 96%, about or 97%, about or 98%,or about or 99% identity with the gene sequences to be used. Identity isdetermined using the BLAST algorithm, as described above. The otherpolynucleotides for use in the practice of the disclosure may also bedescribed on the basis of the ability to hybridize to polynucleotides ofthe disclosure under stringent conditions of about 30% v/v to about 50%formamide and from about 0.01M to about 0.15M salt for hybridization andfrom about 0.01M to about 0.15M salt for wash conditions at about 55 toabout 65° C. or higher, or conditions equivalent thereto.

In a further embodiment of the disclosure, a population of singlestranded nucleic acid molecules comprising one or both strands of ahuman gene sequence is provided as a probe such that at least a portionof said population may be hybridized to one or both strands of a nucleicacid molecule quantitatively amplified from RNA of a cell or sample ofthe disclosure. The population may be only the antisense strand of ahuman gene sequence such that a sense strand of a molecule from, oramplified from, a cell may be hybridized to a portion of saidpopulation. The population preferably comprises a sufficiently excessamount of said one or both strands of a human gene sequence incomparison to the amount of expressed (or amplified) nucleic acidmolecules containing a complementary gene sequence.

The disclosure further provides a method of classifying a human tumorsample by detecting the expression levels of 5 or more, optionally 50 ormore, transcribed sequences in a nucleic acid or cell containing sampleobtained from a human subject, and classifying the sample as containinga tumor cell of a tumor type found in humans to the exclusion of one ormore other human tumor types. In some embodiments, the method may beused to classify a sample as being, or having cells of, one of the 54tumor types listed above to the exclusion of one or more of the others.

The disclosure also provides a method for classifying tumor samples asbeing one of a subset of the possible tumor types described herein bydetecting the expression levels of 5 or more, or optionally 50 or more,transcribed sequences in a nucleic acid containing tumor sample obtainedfrom a human subject, and classifying the sample as being one of anumber of tumor types found in humans to the exclusion of one or moreother human tumor types. In some embodiments of the disclosure, thenumber of other tumor types is from 1 to about 3, more preferably from 1to about 5, from 1 to about 7, or from 1 to about 9 or about 10. Inother embodiments, the tumor types are all of the same tissue or organorigin such as those listed above.

In additional embodiments, the disclosure may be practiced by analyzinggene expression from single cells or homogenous cell populations whichhave been dissected away from, or otherwise isolated or purified from,contaminating cells of a sample as present in a simple biopsy. Oneadvantage provided by these embodiments is that contaminating, non-tumorcells (such as infiltrating lymphocytes or other immune system cells)may be removed as so be absent from affecting the genes identified orthe subsequent analysis of gene expression levels as provided herein.Such contamination is present where a biopsy is used to generate geneexpression profiles.

In further embodiments of the disclosure utilizing Q-PCR or reversetranscriptase Q-PCR as the assay platform, the expression levels of genesequences of the disclosure may be compared to expression levels ofreference genes in the same sample or a ratio of expression levels maybe used. This provides a means to “normalize” the expression data forcomparison of data on a plurality of known tumor types and a cellcontaining sample to be assayed. Moreover, the Q-PCR may be performed inwhole or in part with use of a multiplex format.

In an additional aspect, the methods provided by the present disclosuremay also be automated in whole or in part. This includes the embodimentof the disclosure in software. Non-limiting examples include processorexecutable instructions on one or more computer readable storage deviceswherein said instructions direct the classification of tumor samplesbased upon gene expression levels as described herein. Additionalprocessor executable instructions on one or more computer readablestorage devices are contemplated wherein said instructions causerepresentation and/or manipulation, via a computer output device, of theprocess or results of a classification method.

The disclosure includes software and hardware embodiments wherein thegene expression data of a set of gene sequences in a plurality of knowntumor types is embodied as a data set. In some embodiments, the geneexpression data set is used for the practice of a method of thedisclosure. The disclosure also provides computer related means andsystems for performing the methods disclosed herein. In someembodiments, an apparatus for classifying a cell containing sample isprovided. Such an apparatus may comprise a query input configured toreceive a query storage configured to store a gene expression data set,as described herein, received from a query input; and a module foraccessing and using data from the storage in a classification algorithmas described herein. The apparatus may further comprise a string storagefor the results of the classification algorithm, optionally with amodule for accessing and using data from the string storage in an outputalgorithm as described herein.

The steps of a method, process, or algorithm described in connectionwith the embodiments disclosed herein may be embodied directly inhardware, in a software module executed by a processor, or in acombination of the two. The various steps or acts in a method or processmay be performed in the order shown, or may be performed in anotherorder. Additionally, one or more process or method steps may be omittedor one or more process or method steps may be added to the methods andprocesses. An additional step, block, or action may be added in thebeginning, end, or intervening existing elements of the methods andprocesses.

A further aspect of the disclosure provides for the use of the presentdisclosure in relation to clinical activities. In some embodiments, thedetermination or measurement of gene expression as described herein isperformed as part of providing medical care to a patient, including theproviding of diagnostic services in support of providing medical care.Thus the disclosure includes a method in the medical care of a patient,the method comprising determining or measuring expression levels of genesequences in a cell containing sample obtained from a patient asdescribed herein. The method may further comprise the classifying of thesample, based on the determination/measurement, as including a tumorcell of a tumor type or tissue origin in a manner as described herein.The determination and/or classification may be for use in relation toany aspect or embodiment of the disclosure as described herein.

The determination or measurement of expression levels may be preceded bya variety of related actions. In some embodiments, the measurement ispreceded by a determination or diagnosis of a human subject as in needof said measurement. The measurement may be preceded by a determinationof a need for the measurement, such as that by a medical doctor, nurseor other health care provider or professional, or those working undertheir instruction, or personnel of a health insurance or maintenanceorganization in approving the performance of the measurement as a basisto request reimbursement or payment for the performance.

The measurement may also be preceded by preparatory acts necessary tothe actual measuring. Non-limiting examples include the actual obtainingof a cell containing sample from a human subject; or receipt of a cellcontaining sample; or sectioning a cell containing sample; or isolatingcells from a cell containing sample; or obtaining RNA from cells of acell containing sample; or reverse transcribing RNA from cells of a cellcontaining sample. The sample may be any as described herein for thepractice of the disclosure.

The disclosure further provides kits for the determination ormeasurement of gene expression levels in a cell containing sample asdescribed herein. A kit will typically comprise one or more reagents todetect gene expression as described herein for the practice of thepresent disclosure. Non-limiting examples include polynucleotide probesor primers for the detection of expression levels, one or more enzymesused in the methods of the disclosure, and one or more tubes for use inthe practice of the disclosure. In some embodiments, the kit willinclude an array, or solid media capable of being assembled into anarray, for the detection of gene expression as described herein. Inother embodiments, the kit may comprise one or more antibodies that isimmunoreactive with epitopes present on a polypeptide which indicatesexpression of a gene sequence. In some embodiments, the antibody will bean antibody fragment.

A kit of the disclosure may also include instructional materialsdisclosing or describing the use of the kit or a primer or probe of thepresent disclosure in a method of the disclosure as provided herein. Akit may also include additional components to facilitate the particularapplication for which the kit is designed. Thus, for example, a kit mayadditionally contain means of detecting the label (e.g. enzymesubstrates for enzymatic labels, filter sets to detect fluorescentlabels, appropriate secondary labels such as a sheep anti-mouse-HRP, orthe like). A kit may additionally include buffers and other reagentsrecognized for use in a method of the disclosure.

Having now generally provided the disclosure, the same will be morereadily understood through reference to the following examples which areprovided by way of illustration, and are not intended to be limiting ofthe disclosure, unless specified.

EXAMPLES Example 1 Expression Levels of PANX3 (pannexin3)

The expression level of PANX3 (pannexin 3) in multiple samples of aplurality of 39 known tumor types from human subjects was determined andthe results are shown in FIG. 1. The range of the expression level ineach tumor type overlapped significantly in 38 of the 39 tumor types.

Example 2 Expression Levels of BATF

The expression level of BATF (basic leucine zipper transcription factor,ATF-like) in multiple samples of the plurality of 39 known tumor typesof Example 1 was determined and the results are shown in FIG. 2. Therange of the expression level in each tumor type overlapped amongmembers of all 39 tumor types.

Example 3 Expression Levels of Additional Transcribed Sequences

The expression levels of the additional 85 transcribed sequences asdisclosed herein were determined in the same manner as Examples 1 and 2across the same 39 tumor types. The ranges of expression levels for eachtranscribed sequence were observed to overlap between multiple tumortypes.

Example 4 Expression Levels of Transcribed Sequences in Additional TumorTypes

The expression levels of each of the 87 transcribed sequences asdisclosed herein were determined in the same manner as Examples 1 and 2across the disclosed 54 tumor types. The ranges of expression levels foreach transcribed sequence were observed to overlap between multipletumor types.

All references cited herein, including patents, patent applications, andpublications, are hereby incorporated by reference in their entireties,whether previously specifically incorporated or not.

Having now fully described the inventive subject matter, it will beappreciated by those skilled in the art that the same can be performedwithin a wide range of equivalent parameters, concentrations, andconditions without departing from the spirit and scope of the disclosureand without undue experimentation.

While this disclosure has been described in connection with specificembodiments thereof, it will be understood that it is capable of furthermodifications. This application is intended to cover any variations,uses, or adaptations of the disclosure following, in general, theprinciples of the disclosure and including such departures from thepresent disclosure as come within known or customary practice within theart to which the disclosure pertains and as may be applied to theessential features hereinbefore set forth.

1. A method of amplifying transcribed sequences for classifying a cellcontaining sample as containing tumor cells of a type of tissue, saidmethod comprising producing cDNA copies of 50 or more transcribedsequences from cells in a cell containing sample obtained from a humansubject, amplifying said cDNA copies to produce amplified molecules,comparing said expression levels of 50 or more transcribed sequences toexpression levels of the same 50 or more transcribed sequences in aplurality of known tumor types comprising one or more known tumortissues selected from adrenal-cortical tumor, tumor of the salivarygland, squamous cell carcinoma, neuroendocrine-pancreas cancer, merkelcell carcinoma, lung carcinoid, primitive neuroectodermal tumor, sexcord stromal tumor, thymic carcinoma/thymoma, adenocarcinoma of urinarybladder, and squamous cell carcinoma of urinary bladder; and classifyingthe sample as containing or not containing tumor cells of a tumor typeor tissue in said plurality.
 2. The method of claim 1 wherein theplurality comprises adrenal-cortical tumor, tumor of the salivary gland,squamous cell carcinoma, neuroendocrine-pancreas cancer, merkel cellcarcinoma, lung carcinoid, primitive neuroectodermal tumor, sex cordstromal tumor, thymic carcinoma/thymoma, adenocarcinoma of urinarybladder, and squamous cell carcinoma of urinary bladder, and saidclassifying is of the sample as containing or not containing tumor cellsof an adrenal-cortical tumor, tumor of the salivary gland, squamous cellcarcinoma, neuroendocrine-pancreas cancer, merkel cell carcinoma, lungcarcinoid, primitive neuroectodermal tumor, sex cord stromal tumor,thymic carcinoma/thymoma, adenocarcinoma of urinary bladder, andsquamous cell carcinoma of urinary bladder.
 3. The method of claim 2wherein the plurality comprises adrenal-cortical tumor, adrenalpheochromocytoma, tumor of the brain, adenocarcinoma of breast, cervicaladenocarcinoma, cervical squamous cell carcinoma, cholangiocarcinoma,endometrial adenocarcinoma, esophageal squamous cell carcinoma,gastrointestinal stromal tumor, adenocarcinoma of gallbladder,gastro-esophageal adenocarcinoma, seminomatous germ cell tumor,nonseminomatous germ cell tumor, tumor of the salivary gland, squamouscell carcinoma, colorectal adenocarcinoma, small intestineadenocarcinoma, clear cell renal cell carcinoma, chromophobe renal cellcarcinoma, papillary renal cell carcinoma, hepatocellular carcinoma,lung adenocarcinoma, lung squamous cell carcinoma, lymphoma, melanoma,meningioma, mesothelioma, small/large cell neuroendocrine lung cancer,neuroendocrine-pancreas cancer, merkel cell carcinoma, gastrointestinalcarcinoid, lung carcinoid, clear cell adenocarinoma, endometrioidadenocarcinoma, mucinous adenocarcinoma, serous adenocarcinoma,pancreatic adenocarcinoma, prostate adenocarcinoma, malignant fibroushistiocytoma, primitive neuroectodermal tumor, leiomyosarcoma,liposarcoma, osteosarcoma, synovial sarcoma, sex cord stromal tumor,basal cell carcinoma, skin squamous cell carcinoma, thymiccarcinoma/thymoma, follicular/papillary carcinoma, medullary carcinoma,transitional cell carcinoma, adenocarcinoma of bladder, and squamouscell carcinoma of bladder, and said classifying is of the sample ascontaining or not containing tumor cells of an adrenal-cortical tumor,adrenal pheochromocytoma, tumor of the brain, adenocarcinoma of breast,cervical adenocarcinoma, cervical squamous cell carcinoma,cholangiocarcinoma, endometrial adenocarcinoma, esophageal squamous cellcarcinoma, gastrointestinal stromal tumor, adenocarcinoma ofgallbladder, gastro-esophageal adenocarcinoma, seminomatous germ celltumor, nonseminomatous germ cell tumor, tumor of the salivary gland,squamous cell carcinoma, colorectal adenocarcinoma, small intestineadenocarcinoma, clear cell renal cell carcinoma, chromophobe renal cellcarcinoma, papillary renal cell carcinoma, hepatocellular carcinoma,lung adenocarcinoma, lung squamous cell carcinoma, lymphoma, melanoma,meningioma, mesothelioma, small/large cell neuroendocrine lung cancer,neuroendocrine-pancreas cancer, merkel cell carcinoma, gastrointestinalcarcinoid, lung carcinoid, clear cell adenocarinoma, endometrioidadenocarcinoma, mucinous adenocarcinoma, serous adenocarcinoma,pancreatic adenocarcinoma, prostate adenocarcinoma, malignant fibroushistiocytoma, primitive neuroectodermal tumor, leiomyosarcoma,liposarcoma, osteosarcoma, synovial sarcoma, sex cord stromal tumor,basal cell carcinoma, skin squamous cell carcinoma, thymiccarcinoma/thymoma, follicular/papillary carcinoma, medullary carcinoma,transitional cell carcinoma, adenocarcinoma of bladder, and squamouscell carcinoma of bladder.
 4. The method of claim 1 wherein said 50 ormore transcribed sequences comprise the disclosed 87 gene sequences. 5.The method of claim 1 wherein all said expression levels are determinedby use of a microarray.
 6. The method of claim 1 wherein saidclassifying is with an accuracy of 60% or higher.
 7. The method of claim1 wherein said amplifying comprises amplification of all or part of thetranscribed sequences, or reverse transcription and labeling RNAcorresponding to said transcribed sequences.
 8. The method of claim 7wherein said amplification comprises linear RNA amplification orquantitative PCR.
 9. The method of claim 8 wherein said amplification isquantitative PCR amplification of at least 50 nucleotides of thetranscribed sequences.
 10. The method of claim 1, wherein cDNA copies of50 to 100 transcribed sequences are produced.
 11. The method of claim 1,wherein said sample is a formalin fixed, paraffin embedded (FFPE)sample.
 12. The method of claim 1, wherein a majority of the expressionlevels of 50 or more transcribed sequences overlap in said plurality.13. The method of claim 1, wherein 30 or more of the expression levelsof 50 or more transcribed sequences overlap in said plurality.
 14. Themethod of claim 1, wherein 35 or more of the expression levels of 50 ormore transcribed sequences overlap in said plurality.
 15. The method ofclaim 1, wherein 40 or more of the expression levels of 50 or moretranscribed sequences overlap in said plurality.
 16. The method of claim10, wherein 55 or more of the expression levels of 50 or moretranscribed sequences overlap in said plurality.
 17. The method of claim10, wherein 60 or more of the expression levels of 60 or moretranscribed sequences overlap in said plurality.